library(tidyverse)
library(magrittr)
library(dplyr)
library(gganimate)
library(av)
library(gifski)
library(ggthemes)library(babynames) # using the babynamesTo start off, we’re using a package called babynames. I used it in a previous data science class in high school, so I’m pretty familiar with it.
It has the variables year, sex, name, n, prop.
year: the year of the namesex: the gender of the baby, either M for male and F for female in this datasetname: the name of the babyn: the number of babies with that name in that yearprop: the proportion of all babies with that name in one yearbabynames %>%
filter(sex == "F") %>%
group_by(name) %>%
summarize(Mean_Proportion = mean(prop) ) %>%
arrange(desc(Mean_Proportion))fb <- babynames %>%
filter(sex == "F") %>%
filter(name %in% c("Mary", "Elizabeth", "Margaret", "Helen", "Anna")) ggplot(fb, aes(x = year,
y = prop,
group = name,
color = factor(name)) ) +
geom_line() +
theme_bw() +
labs(x = "Year", y = "Proportion with the Name" , title = "Top 5 Popular Girl Names" , subtitle = "1880-2017") +
#scale_color_viridis_d() +
#scale_x_log10() +
transition_reveal(year) -> fba
anim_save('girl.gif', animation = fba)Above, we have an animation of the top 5 baby names (NOTE: measured by mean proportion over this entire time span) for girls (Mary, Elizabeth, Margaret, Helen, and Anna as the previous table above showed) from 1880-2017. Mary is more popular than the rest at the start in 1880, but – as we approach 2017 – it really dips down. Helen and Margaret sees a peak near the 1920s but drops dramatically after. Anna and Elizabeth simply decrease for the most part over this span.
A pattern that is interesting is that, over time, all of these names have become less commonly used. This could be because of a few things on the top of my head. Perhaps as the years have gone on, parents have made a deliberate effort to name their children uncommon names. Another thought is that the racial makeup of the United States has likely changed greatly from 1880 to 2017 – perhaps certain demographics have different preferences for names beyond these five names. Therefore, as the population of non-white Americans increased in the United States, the proportion of perferences for certain names has decreased.
I also get the feeling that this is not just a racial thing – from anecdotal evidence, I don’t happen to know many Helens or Elizabeths. I get the feeling that this could require a generational breakdown. These top 5 girl names really drop towards the 1960s, and maybe some other names start to skyrocket after the 1960s.
babynames %>%
filter(sex == "M") %>%
group_by(name) %>%
summarize(Mean_Proportion = mean(prop) ) %>%
arrange(desc(Mean_Proportion))mb <- babynames %>%
filter(sex == "F") %>%
filter(name %in% c("John", "James", "William", "Robert", "Charles"))
ggplot(mb, aes(x = year,
y = prop,
group = name,
color = factor(name)) ) +
geom_line() +
theme_bw() +
labs(x = "Year", y = "Proportion with the Name", title = "Top 5 Popular Boy Names", subtitle = "1880-2017") +
#scale_color_viridis_d() +
#scale_x_log10() +
transition_reveal(year) -> mba
anim_save('boy.gif', animation = mba)On a side note, it’s pretty interesting that John and Mary are the most popular names to start off in 1880. 0.0815 of men had the name John in 1880, as opposed to 0.0724 of women having the most popular name Mary. I have a feeling that these two names were quite popular because they are biblical names. More on this in a bit.
Above, we have an animation of the top 5 baby names for boys (John, James, William, Robert, and Charles as the previous table above showed) from 1880-2017. Interestingly enough, the proportion of men who have names appears to be lower than those for women, excluding John. This can be seen in the y-axises – the one for boys goes down quite a bit.
As to why this is the case? I am sure that others have better ideas, but – if I had to guess – it could be that parents were more creative with names of boys? Perhaps there was a wider variety of names acceptable (Michael, Joseph, David, George, Thomas). I am assuming that many parents named their children with biblical names, and I think that there are more male names in the Bible than female names. Historically speaking, the 1880s to the 1960s really emphasize uniformity. The 1880s is really the start of American industry, and with it increased work hours for most Americans. I don’t think that many Americans were focused on the name of their children. The same goes for, really the 1920s-1940s, a period in which men attempted to dress like each other, emphasizing the tophat and such. The 1940-1950s also emphasized conformity as well, but this began to change around the 1960s, with changes in fashion and such.
Furthermore, something tells me that around 1945, really the beginning in the rise of American dominance of the global economy, Americans had more leisure time. They also had more children, because of the Baby Boom and as a result needed to name their children more names.
This is really just my hunch and I don’t have any concrete evidence to suggest so, so let’s go check it out.
babynames %>%
filter(sex == "F", year >= 1945) %>%
group_by(name) %>%
summarize(Mean_Proportion = mean(prop) ) %>%
arrange(desc(Mean_Proportion)) fb45 <- babynames %>%
filter(sex == "F", year >= 1945) %>%
filter(name %in% c("Mary", "Jennifer", "Linda", "Elizabeth", "Patricia")) ggplot(fb45, aes(x = year,
y = prop,
group = name,
color = factor(name)) ) +
geom_line() +
theme_bw() +
labs(x = "Year", y = "Proportion with the Name", title = "Top 5 Popular Girl Names",
subtitle = "1945-2017") +
#scale_color_viridis_d() +
#scale_x_log10() +
transition_reveal(year) -> fba45
anim_save('girl2.gif', animation = fba45) So looking at the top 5 names for 1945 onwards – as measured by mean proportion of people with this name from 1880-2017, Mary is still on the top. Linda peaks for a bit but then drops off, along with Patricia to a lesser extent. Jennifer peaks much later, roughly around the 1970s.
So, there’s nothing really conclusive about this – the names have changed from what we looked at previously. Maybe the baby boom isn’t really a reason and it is just generational changes with preferences in names.
Thinking about it more carefully, parents probably are less inclined to name their children the same as their name so this may have to more than just the baby boom.
babynames %>%
filter(sex == "M", year >= 1945) %>%
group_by(name) %>%
summarize(Mean_Proportion = mean(prop) ) %>%
arrange(desc(Mean_Proportion))mb45 <- babynames %>%
filter(sex == "M", year >= 1945) %>%
filter(name %in% c("Michael", "James", "John", "David", "Robert"))
ggplot(mb45, aes(x = year,
y = prop,
group = name,
color = factor(name)) ) +
geom_line() +
theme_bw() +
labs(x = "Year", y = "Proportion with the Name", title = "Top 5 Popular Boy Names", subtitle = "1945-2017") +
#scale_color_viridis_d() +
#scale_x_log10() +
transition_reveal(year) -> mba45
anim_save('boy2.gif', animation = mba45)Interesting enoughly, the most popular names did change quite a bit. We know have Michael at top instead of John (in terms of mean proportion from 1945 to 2017), while James will stays at second in front of John, David, and Robert.
It appears though, once again, that the baby boom may have not affected it. This brings me to think that the names are mostly be generation and more research is needed into this. In other words, splitting this up by 30 year segments would provide a better idea of trends with names.
Thank you so so much to Emily for helping me with my graphs and everything; truly helpful and I could not go anywhere without her patience.